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Abstract — The noncoherent capacity of stationary discrete-time 
fading channels is known to be very sensitive to the fine details of 
the channel model. More specifically, the measure of the support 
of the fading-process power spectral density (PSD) determines if 
noncoherent capacity grows logarithmically in SNR or slower than 
logarithmically. Such a result is unsatisfactory from an engineer- 
ing point of view, as the support of the PSD cannot be determined 
through measurements. The aim of this paper is to assess whether, 
for general continuous-time Raylelgh-fadlng channels, this sensi- 
tivity has a noticeable Impact on capacity at SNR values of prac- 
tical interest. 

To this end, we consider the general class of band-limited 
continuous-time Rayleigh-fadlng channels that satisfy the wide- 
sense stationary uncorrelated-scattering (WSSUS) assumption 
and are, in addition, underspread. We show that, for all SNR 
values of practical Interest, the noncoherent capacity of every 
channel in this class is close to the capacity of an AWGN 
channel with the same SNR and bandwidth. Independently 
of the measure of the support of the scattering function (the 
two-dimensional channel PSD). Our result is based on a lower 
bound on noncoherent capacity, which is built on a discretization 
of the channel input-output relation induced by projecting onto 
Weyl-Heisenberg (WH) sets. This approach is Interesting in its 
own right as it yields a mathematically tractable way of dealing 
with the mutual information between certain continuous-time 
random signals. 

Index Terms — Continuous-time, ergodic capacity, fading chan- 
nels, Weyl-Heisenberg sets, wide-sense stationary uncorrelated- 
scattering, underspread property. 

I. Introduction and Summary of Results 

The capacity of fading channels in the noncoherent setting 
where neither transmitter nor receiver are aware of the real- 
izations of the fading process, but both know its statistics,' is 
notoriously difficult to analyze, even for simple channel models. 
Most of the results available in the literature pertain to either low 
or high signal-to-noise ratio (SNR) asymptotics. While in the 
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' Capacity in the noncoherent setting is sometimes called noncoherent capacity; 
in the remainder of this paper, it will be referred to simply as capacity. We will 
use the adjective coherent to denote the setting where the channel realizations 
are perfectly known at the receiver but unknown at the transmitter, which is 
assumed to know the channel statistics only. 
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Fig. 1. Two channels with similar PSD c(9), but drastically different high-SNR 
capacity behavior. 



low-SNR regime the capacity behavior is robust with respect to 
the underlying channel model (see for example [1], [2]), this is 
not the case in the high-SNR regime, where — as we are going to 
argue next — capacity is very sensitive to the fine details of the 
channel model. 

Consider, e.g., a discrete-time stationary frequency-flat time- 
selective Rayleigh-fading channel subject to additive white Gaus- 
sian noise (AWGN). Here, the channel statistics are fully spec- 
ified by the fading-process power spectral density (PSD) c(d), 
9 e [-1/2, 1/2), and by the noise variance. The high-SNR 
capacity of this channel turns out to depend on the measure ji 
of the support of the PSD. More specifically, let p denote the 
SNR; if /x < 1, capacity behaves as (1 — /x) log/o in the high- 
SNR regime [3]. The pre -log factor (1 — /^) quantifies the loss 
in signal-space dimensions (relative to coherent capacity [4], 
which behaves as log p) due to the lack of channel knowledge 
at the receiver.^ For p -^ 1 this loss is negligible, suggest- 
ing that, in this case, the realizations of the fading channel 
can be learned at the receiver (at high SNR) by sacrificing a 
negligible fraction of the signal-space dimensions available for 
communication. If /i = 1 and the fading process is regular, 
i.e., J_[,2 iogc{6)d9 > — oo, the high-SNR capacity behaves 
as log log /9 [7]. This double-logarithmic growth behavior of 
capacity with SNR renders communication in the high-SNR 
regime extremely power inefficient. 

As a consequence of the results just mentioned, we have 
the following: consider two discrete-time stationary Rayleigh- 
fading channels, the first one with PSD equal to l/A for 9 E 
[-A/2, A/2] and else (0 < A < 1), and the second one with 
PSD equal to (1 - e)/A for 9 e [-A/2, A/2] and e/(l - A) 
else (0 < e < 1, see Fig. 1). These two channels will have 
completely different high-SNR capacity behavior, no matter how 



^Results of the same nature as those reported in [3] were obtained previously 
for the block- fading channel model (a non-stationary channel model) in [5], [6]. 



small e is. Specifically, tiie capacity of the first channel behaves 
as (1 — A) log/9, whereas the capacity of the second one grows 
as log log p. A result like this is clearly unsatisfactory from an 
engineering point of view, as the measure of the support of a 
PSD cannot be determined through channel measurements. Such 
a sensitive dependency of the (high-SNR) capacity behavior 
on the fine details of the channel model (by fine details we 
mean details that, in the words of Slepian [8], have ". . . no direct 
meaningful counterparts in the real world . . . "), should make one 
question the usefulness of the discrete-time stationary channel 
model itself, at least for high-SNR analyses. In the light of 
this observation, an engineering-relevant problem is to assess 
whether this sensitivity has a noticeable impact on capacity at 
SNR values of practical interest. Unfortunately, this problem 
is still largely open. For the stationary discrete- time case, an 
attempt to characterize the capacity sensitivity was made in [9], 
where, for a first-order Gauss-Markov channel process (a regular 
process), the SNR beyond which capacity starts exhibiting a sub- 
logarithmic growth in SNR is computed as a function of the 
innovation variance A of the process. More specifically, it is 
shown in [9] that for p ::^ 1 and A ^ 1 capacity grows as log p 
as long as p < 1/A. In words, when the innovation variance is 
small, the high-SNR capacity grows logarithmically in SNR up 
to SNR values not exceeding 1/A. The main limitation of this 
result lies in the fact that it is based on a highly specific channel 
model, namely a first-order Gauss-Markov process, which is 
fully described by a single parameter, the innovation variance. 
Furthermore, it is difficult to relate this parameter to physical 
channel quantities such as the channel Doppler spread. 

A more general approach is presented in [7], where the fading 
number, defined as the second term in the high-SNR expansion of 
capacity, is characterized for arbitrary discrete-time, stationary, 
regular fading channels. The fading number determines the rate 
after which the log log regime kicks in, and communication be- 
comes extremely power inefficient. Unfortunately, as illustrated 
in [10], it is, in general, not possible to relate the fading number 
to the SNR value at which the log log behavior comes into effect. 

The purpose of this paper is to characterize the sensitivity 
of capacity with respect to the channel model for the general 
class of continuous-time Rayleigh-fading linear time-varying 
(LTV) channels that satisfy the wide-sense stationary (WSS) 
and uncorrelated scattering (US) assumptions [11] and that 
are, in addition, underspread [12]. The Rayleigh-fading and the 
WSSUS assumptions imply that the statistics of the channel are 
fully characterized by its two-dimensional PSD, often referred 
to as the scattering function [11]; the underspread assumption is 
satisfied if the scattering function is "highly concentrated" in the 
delay-Doppler plane. Different definitions of the underspread 
property are available in the literature (e.g., in terms of the 
support area of the scattering function [1], [13] or in terms of its 
moments [14]). For the problem considered in this paper, it is 
crucial to adopt a novel definition of the underspread property 
(see Definition 1 in Section II-B), inspired by Slepian's treatment 
of finite-energy signals that are approximately time- and band- 
limited [8]. Specifically, we shall say that a WSSUS channel is 
underspread if its scattering function has only a fraction e <C 1 
of its volume outside a rectangle of area An <C 1 . This 
novel definition of the underspread property encompasses the 



underspread definitions previously proposed in the hterature [13], 
[1], [14] and generalizes them. 

When 6 = 0, i.e., when the scattering function is compactly 
supported, and Ag ^ 1 we expect — on the basis of the results 
obtained in [7], [3] in the context of the stationary discrete-time 
fading channel model — capacity to grow logarithmically in SNR. 
Unfortunately, it is not possible to determine through channel 
measurements whether a scattering function is compactly sup- 
ported or not, which motivates our novel underspread definition. 
For the practically more relevant case < e <C 1, we show that 
the sub-logarithmic growth behavior kicks in only at very large 
SNR. Our result is built on a lower bound on the capacity of 
band-limited continuous-time WSSUS underspread Rayleigh- 
fading channels that is explicit in the channel parameters An 
and e. By comparing this lower bound to a trivial capacity upper 
bound, namely, the capacity of a nonfading AWGN channel with 
the same SNR and bandwidth, we find that, for all SNR values 
of practical interest, the fading channel capacity is close-' to the 
capacity of a nonfading AWGN channel (with the same SNR 
and bandwidth). As a rule of thumb, this statement is true for 
all SNR values in the range a/Ah <C p ^ 1/(Ah + e). Hence, 
we conclude that the fading channel capacity essentially grows 
logarithmically in SNR for all SNR values of practical interest. 

Information theoretic analyses of continuous-time channels 
are notoriously difficult. The standard approach is to discretize 
the continuous-time channel input-output (I/O) relation by pro- 
jecting the input and output signals onto the singular functions 
of the channel operator [15], [16]. This yields a diagonalized 
discretized I/O relation consisting of countably many scalar, 
non-interacting I/O relations. Unfortunately, this approach is 
not viable in our setting because random LTV channels have 
random singular functions, which are not known to transmit- 
ter and receiver in the noncoherent setting [1], [2]. We will 
nevertheless discretize the channel by constraining the input 
signal to lie in the span of an orthonormal Weyl-Heisenberg 
(WH) set, i.e., a set of time-frequency shifted versions of a 
given function, and by projecting the receive signal on the same 
set of functions. This guarantees that the resulting discretized 
channel inherits the (two-dimensional) stationarity property of 
the underlying continuous-time channel, a fact that is essential 
for our analysis. This approach is interesting in its own right, 
as it yields a mathematically tractable way of dealing with the 
mutual information between certain continuous-time random 
signals. 

In [1] a similar approach was used to obtain bounds on the 
capacity of continuous-time Rayleigh-fading WSSUS under- 
spread channels at low SNR. These bounds are derived under 
the assumption that the off-diagonal terms in the discretized I/O 
relation can be neglected, which greatly simplifies the capacity 
analysis. Whereas this simplification was shown in [2] to be 
admissible at low SNR, it is unclear whether the off-diagonal 
terms can be neglected at high SNR. We will therefore explic- 
itly account for the off-diagonal terms in the discretized I/O 
relation by treating them as (signal-dependent) additive noise, 
and thus obtain a firm lower bound on the capacity of the 

^"Close" here means that the ratio between the capacity lower bound and the 
capacity of a nonfading AWGN channel (with the same SNR and bandwidth) 
exceeds 0.75. 



underlying continuous-time channel. This lower bound yields 
an information-theoretic criterion for the design of WH sets to 
be used for pulse-shaped (PS) orthogonal frequency-division 
multiplexing (OFDM) communication systems operating over 
Rayleigh-fading WSSUS underspread fading channels. In partic- 
ular, the lower bound suggests that the WH set should be chosen 
so as to optimally trade signal-space dimensions (available 
for communication) for minimization of the power of the off- 
diagonal terms in the resulting discretized I/O relation. 

Notation: Uppercase boldface letters denote matrices, and 
lowercase boldface letters designate vectors. The Hilbert space 
of complex-valued finite-energy signals is denoted as £^(M); 
furthermore, (•, •) and |||| stand for the inner product and the 
norm in £^(IR), respectively. The set of positive real numbers is 
denoted as IR_|_ and the set of integers as Z; E[-] is the expectation 
operator, h(-) denotes differential entropy, and F[-] stands for the 
Fourier transform. For two vectors a and b of equal dimension, 
the Hadamard (element-wise) product is denoted as a© b. We 
write diagjx} for the diagonal matrix that has the elements of 
the vector x on its main diagonal. The superscripts ^, *, and ^ 
stand for transposition, element-wise conjugation, and Hermitian 
transposition, respectively. The largest eigenvalue of a Hermi- 
tian matrix A is denoted as Amax{A}. For two functions f{x) 
and g{x), the notation f{x) = 0{g{x)), x —J- oo, means 
that limsupj.^g^ \f{x)/g{x)\ < oo. Finally, S[k] is defined 
as S[0] — 1 and S[k] — for fc 7^ 0. Throughout the paper, 
we shall make use of the following projection operators acting 
on £^(M): the time-limiting operator Tjj, defined as 



ii) is approximately time-limited to a duration of D sec accord- 
ing to 



[Toxm 



x{t), if \t\ < D/2 
0, otherwise 



which limits x(t) to the interval [—D/2, D /2], andthefrequency- 
limiting operator defined as 

/■ sin[7rT4^(t - t')l / /^ , / 
{Mwx){t) = J -^-^^^^x{t')dt' 

t' 

which limits the Fourier transform of x(t) to the interval 

{-WI2,WI2\. 

II. System Model 

A. Channel and Signal Model 

The I/O relation of a continuous-time random LTV channel EI 
can be written as [17] 

yit) = {mx)it)+wit) 

= r{t) 

hM{t,T)x(t-T)dT + w(t). (1) 

Here, r{t) is the output signal in the absence of additive noise. 
Following [15, Model 2], we assume that the stochastic input 
signal x{t): 
i) is strictly band-limited to W Hz according to 



E[||Tz5x(t)||2] >{l-rj)E[\\x{t)\\ 

where < 77 <C 1; 
iii) satisfies the average-power constraint 

il/D)E[\\xit)r]<P. 



(3) 



(4) 



The constraints (2) and (3) capture the fact that we are dealing 
with input signals that are strictly band-limited and essentially 
time-limited. As pointed out in [15, p. 364], time limitation is 
important as this allows for a physically meaningful definition 
of transmission rate. Note that the strict bandwidth constraint (2) 
implies that any nonzero x{t) can be limited in time only in an 
approximate sense [8], a consideration that justifies the form of 
the constraint expressed in (3). 

The signal ■w{t) is a zero-mean proper AWGN process with 
double-sided PSD equal to 1. Finally, the time-varying channel 
impulse response /ih(^, t) is a zero-mean jointly proper Gaussian 
(JPG) process in time t and delay r that satisfies the WSSUS 
assumption 

E[hM{t, T)h*^{t', r')] = Rm{t - t', t)5{t - t') (5) 

and is independent of w{t) and x{t). As a consequence of the 
JPG and the WSSUS assumptions, the time-delay correlation 
function Rm{t, t) fully characterizes the channel statistics. 

Often, it is convenient to describe the action of the channel H 
in domains other than the time-delay domain used in (1). Specifi- 
cally, we shall frequently work with the following alternative I/O 
relation [cf (1)], which is explicit in the channel delay-Doppler 
spreading function S^{t, v) — J^ hmit, r)e^^^'^'^*dt according 
to 

y{t) = ff Su{t, v)x{t - T)e^'^'"'^dTdv +w{t). 



= r[t) 

This alternative I/O relation leads to the following physical 
interpretation: the noiseless output signal r{t) ~ {M.x){t) is 
a weighted superposition of copies of the input signal x{t) that 
are shifted in time by the delay t and in frequency by the Doppler 
shift V. The spreading function is the corresponding weighting 
function. In other words, the channel operator HI can be repre- 
sented as a continuous weighted superposition of time-frequency 
shift operators. Note that every "reasonable" linear operator 
admits such a representation (see [18, Thm. 14.3.5] for a precise 
mathematical formulation of this statement). As a consequence 
of the WSSUS assumption, the spreading function S'h(t, v) is 
uncorrelated in r and v, i.e., we have 

E[5h(t, :^)^^(t', v')] = Cn{T, v)5{t - t')6{p - v') (6) 

where C^(t,v) is the two-dimensional PSD of the channel 
process, usually referred to as scattering function [17]. In the 
remainder of the paper, we let the scattering function be normal- 
ized in volume according to 



X(/)-0, for|/|>W^/2 
with probability one, where X(/) — E\x(t)\\ 



(2) 



// Cn{T,v)dTdv 



1. 



(7) 



Another system function we shall need is the time-varying 
transfer function 

LuitJ)^ JhM{t,T)e-^^^f-dT 

T 

which, as a consequence of (5), is stationary in both time and 
frequency: 

E[LM{t, f)L*^{t\ /')] - Buit ^t'J- /'). (8) 

Here, B^(t, /) denotes the time-frequency correlation function 
of the channel process, which is related to the scattering function 
through a two-dimensional Fourier transform 

BM{t,f) - 1 1 Cn{T,v)e^^^(''--fUTdv. 



For a more complete description of the WSSUS channel model, 
the interested reader is referred to [17], [1]. 

B. A Robust Definition of Underspread Channels 

Qualitatively speaking, WSSUS underspread channels are 
WSSUS channels with a scattering function that is highly 
concentrated in the delay-Doppler plane [11]. For the case 
where Ch(t, ly) is compactly supported, the channel is said to 
be underspread if the support area of Cm{t, ly) is smaller than 1 
(see for example [13], [1]). The compact-support assumption 
on Ch(t, v), albeit mathematically convenient, is a fine detail 
of the channel model in the terminology introduced in Sec- 
tion I, because it is not possible to determine through channel 
measurements whether Ch(t, v) is compactly supported or not. 
However, the results discussed in Section I, in the context of 
the stationary discrete-time fading channel model, imply a high 
capacity sensitivity to whether the measure of the support of the 
PSD is smaller than 1 or not. A similar sensitivity can be expected 
for the continuous-time WSSUS channel model. To quantify this 
sensitivity, we need to work with a more general underspread 
definition. Specifically, we replace the underspread definition 
based on the compact-support assumption by the following, more 
robust and physically meaningful, assumption: we say that EI is 
underspread if Cm{t, v) has a small fraction of its total volume 
outside a rectangle of area much smaller than 1. More precisely, 
we have the following definition. 

Definition 1: Let tq,vq E M+, e G [0,1], and let H{tq, i^q, e) 
be the set of all Rayleigh-fading WSSUS channels H with 
scattering function C]g[{T, v) satisfying 



Ch(t, v)dTdv > 1 



(9) 



We say that the channels in "H (tq , I'o , e) are underspread if Ah = 
AtqVq < 1 and e < 1. 

Note that it is possible to verify, through channel measure- 
ments, whether a fading channel is underspread according to 
Definition 1. Typical wireless channels are (highly) underspread, 
with most of the volume of Ch(t, v) supported over a rectan- 
gle of area Ah < 10~^ for land-mobile channels, and Ah 
as small as 10~^ for certain indoor channels with restricted 



terminal mobility. Note that setting e = in Definition 1 
yields the compact-support underspread definition of [13], [1]. 
The moment-based underspread definition proposed in [14] is 
subsumed by Definition 1 as well. 

C. Band- Limitation at the Receiver 

Even though x{t) has bandwidth no larger than W, the signal 
r{t) — (H x) (t) is, in general, not strictly band-limited, because 
EI can introduce arbitrarily large frequency dispersion. However, 
if EI is underspread in the sense of Definition 1, most of the 
energy of r(t) will be supported on a frequency band of size 
{W + 2i^o) Hz. We therefore assume that the output signal y(t) 
is passed through an ideal low-pass filter of bandwidth (W + 
21^0) Hz, resulting in the filtered output signal 



yf{t) = iMw+2.„ym. 



(10) 



This filtering operation yields a band-limited WSSUS fading 
channel. 

III. Channel Capacity 

A. Outline of the Information-Theoretic Analysis 

We are interested in characterizing the ultimate limit on the 
rate of reliable communication over the continuous-time fading 
channel (1) in the noncoherent setting (i.e., the setting where 
neither the transmitter nor the receiver know the realization 
of EI, but both know the statistics of H). Two main difficulties 
need to be overcome to obtain such a characterization. First, we 
need to deal with continuous-time channels and signals, which 
are notoriously difficult to analyze information-theoretically. 
Second, our focus is on the noncoherent setting, for which, 
even for simple discrete-time channel models, analytic capacity 
characterizations are not available. 

To overcome these difficulties we resort to bounds on capacity. 
As (trivial) capacity upper bound, we take in Section III-C the 
capacity of a band-limited Gaussian channel [15] with the same 
average-power constraint as in (4) and bandwidth equal to (W + 
2fo). A capacity lower bound is obtained in Section IV through 
the following two steps: first, we construct a discretized channel 
whose capacity is proven to be a lower bound on the capacity 
of the underlying continuous -time channel (1); then, we derive 
a lower bound on the capacity of this discretized channel that 
is explicit in the channel parameters An and e. In Section V, 
we then show that, for channels that are underspread according 
to Definition 1, this lower bound is close to the AWGN -channel 
capacity upper bound for all SNR values of practical interest, 
thereby sandwiching the capacity of the band-limited continuous- 
time fading channel tightly. 

B. Mutual Information and Capacity for the Continuous-Time 
Channel 

Dealing with continuous-time channels requires a suitable 
generalization of the definitions of mutual information and 

capacity [19] to the continuous-time case. Such a generalization 
can be found, e.g., in [20], [16, Ch. 8], and is reviewed here for 
completeness. 



To define capacity of the channel (1), we represent the complex 
signals at the input and output of IH in terms of projections onto 
complete orthonormal sets for the underlying signal spaces. More 
specifically, let {'pm{t)}m=o ^^ ^ complete orthonormal set for 
the space C'^(W) of signals with bandwidth no larger than W. 
We can then describe x{t) G C'^(W) uniquely in terms of the 
projections 



Xm^ {x{t),4)m{t)), m = 0,l, 



(11) 



as x{t) = J2r. 



,{t). Similarly, let {C(t)}^^o be a 



complete orthonormal set for C^ {W+2iyQ). The low -pass filtered 
output signal yf{t) E C'^{W + 2vq) in (10) can be described 
uniquely in terms of the projections 



Vm^ {yf{t),(j)'m{t)), TO = 0,1,. 



(12) 



as yf{t) — X]m2/m'^m(^)- To define the mutual information 
between x (t) and yf{t), we need to impose a probabihty measure 
on x{t)^ Concretely, let QiW, D, r], P) be the set of probability 
measures on x{t) that satisfy the bandwidth constraint (2), 
the time-limitation constraint (3), and the average-power con- 
straint (4). Every probability measure in Q{W, D, rj, P) induces 
a corresponding probability measure on {xm]m=o- Po'" ^ given 
probability measure in Q(VF, D, ?], P), the mutual information 
between a;(t) andy/(t) is defined as [16, Eq. (8.151)], [20, Def. 3, 
Thm. 1.5] 



/(y/(i);x(t))^ lim /(y 

M— !-OC 



M.^M. 



similarly, y*^ 



where x = [xq xi . . . xm\ , and, 
[2/0 Vi ■ ■ ■ Vm]'^ ■ This definition turns out to be independent of 
the complete orthonormal sets {0m(O}m=o ^'^^ {'/'m(0}m=o 
used [20, Thm. 1.5]. The capacity C of the channel (1) can now 
be defined as follows [16, Eq. (8.1.55)]: 



C= lim 



sup 

Q{W,D,ri,P) 



I{yf{t)-x{t)). 



(13) 



We conclude this section by noting that, by Fano's inequality, no 
rate above C is achievable [22]. However, whether the channel 
coding theorem applies to the general class of time-frequency 
selective fading channels considered in this paper is an open 
problem, even for the discrete-time case [23]. 

C. An Upper Bound on Capacity 

For underspread channels in H(to, i/q, e) (see Definition 1) 
and input signals satisfying (2)-(4), we take as simple (yet tight, 
in a sense to be specified in Section V) upper bound on (13) the 
capacity of a (nonfading) band-limited AWGN channel with the 
same average-power constraint as in (4) and bandwidth {W + 
2fo). More precisely, we show in Appendix A that C < Cawgn, 
where 

P 



Cawgn ^{W + 2i^o) log! 1 + (1 - r7)(l - e) 
+{rj + e - T]e)P. 



W + 2uo 



(14) 



'*A probability measure on x{t) is specified tlirougli tlie joint probability 
measure of tlie n-tuples {x(ti), . . . , a;(t„)) for every n £ N and for every 



This result is based on [15, Thm. 2]. Differently from [15, 
Eq. (20)], the second term on the right-hand side (RHS) of (14) 
accounts not only for the approximate time-limitation of x{t), 
but also for the dispersive nature of H. 

It is now appropriate to provide a preview of the nature 
of the results we are going to obtain. We will show that, as 
long as Ah <C 1 and e <C 1, the capacity of every channel 
in H{to, 1^0, e), independently of whether its scattering function 
is compactly supported or not, is close to the AWGN-channel 
capacity Cawgn for all SNR values typically encountered in 
practical wireless communication systems. To establish this 
result, we derive, in the next section, a lower bound on (13). 



IV. A Lower Bound on Capacity 
A. Outline 

As the derivation of the capacity lower bound presented in this 
section consists of several steps, we start by providing an outline 
of our proof strategy. The first step entails restricting the set 
of input distributions in (13) to a subset of Q{W, D, rj, P); this 
clearly yields a lower bound on C. The subset of Q{W, D,ri,P) 
we consider is described in Section IV-B and is obtained by 
constraining the input signal x{t) to lie in the span of an or- 
thonormal WH set (that is not necessarily complete for £'^{W)). 
The second step (see Section IV-C) consists of projecting the 
corresponding output signal yf (t) onto the same orthonormal 
WH set, an operation that further lower-bounds mutual informa- 
tion, as seen by application of the data-processing inequality [20, 
Thm. 1 .4] (the orthonormal WH set is not necessarily complete 
foT C^{W + 2h'o))- As a result of these two steps, we obtain a dis- 
cretization of the I/O relation. The capacity of the corresponding 
discretized channel, which is a lower bound on the capacity of the 
underlying continuous-time channel, is further lower-bounded 
in Section IV-E by treating the off-diagonal terms in the I/O 
relation as (signal-dependent) additive noise. This finally yields 
a lower bound on the capacity of the underlying continuous-time 
channel that is explicit in the channel parameters Ah and e. 



B. A Smaller Set of Input Distributions 

Let gkAt) - ait - fcT)eJ27rn_Ft ^^j^^j 



{g,T,F)^{gkAt)} 



k,nel, 



be an orthonormal WH set, i.e., a set consisting of time-frequency 
shifts (on a rectangular lattice) of a given pulse g{t) E £^(IR). 
Ortho normality of the WH set implies TF > 1, as a consequence 
of [18, Cor 7.5.1, Cor 7.3.2]. We lower-bound C by restricting 
the input signals to be of the form 



x{t) 



K N 

E E 

k = ~Kn=~N 



x[k,n]gk,n{t) 



(15) 



clioice of ti , 



,t„ e M [21, Sec. 25.2]. 



where {x[k, n]} are random coefficients. To guarantee that x(t) 
in (15) satisfies (2)-(4), we impose the following constraints on 
{g,T, F), K, N, and {x[k,n]}. 



-D/2 -K.^T/2 



K^T/2 D/2 



Fig. 2. Insertion of guard intervals. 



1) Average-power constraint: To ensure that x{t) in (15) 
satisfies(4),itissufficienttochooseisrsuchthat (2iir+l)T < D 
(further restrictions on the choice of K will be imposed in Sec- 
tion 1V-B3), and to require that the random variables {x[k, n]} 
satisfy 

K N 

Y^ Y^ E\\x[k,n]\^ <{2K + 1)TP. (16) 

k=-Kn=-N 

The constraint (16), together with the orthonormality of the set 
{g, T, F), implies that (4) is satisfied. 

2) Bandwidth limitation: To ensure that x{t) in (15) satis- 
fies (2), we require that g{t) fulfills the following property. 

Property 1: The function g{t) is strictly band-limited with 
bandwidth F <W. 

Furthermore, we take N = {N^ - l)/2 where N^ = W/F. 
For simplicity of exposition, we shall assume, in the remainder 
of the paper, that Nj. is an odd integer. 

3) Time limitation: To ensure that x{t) in (15) satisfies (3), 
we impose two additional constraints. First, we require that g{t) 
satisfies the following property. 

Property 2: The function g{t) is even and decays faster than 
1/i, i.e., 

.9(t)= 0(1/^1+^), t^oo (17) 

for some /i > 0. 

Second, we insert, in the interval [—D/2, D/2], two guard 
intervals. More specifically, for a given approximate duration 
D of the input signal x{t) [we will later take D —> oo ac- 
cording to (13)], the interval [—D/2, D/2] is divided up into 
three parts (see Fig. 2): the interval [-K^T/2, K^T/2], with 
K — (Xa; — 1)/2 in(15),^ supportingmostof theenergy of a;(t), 
and two guard intervals [-D/2, -K^T/2] and [K^T/2, D/2], 
each of length KgT = D/2 - K^T/2. This will ensure that (3) 
is satisfied. We will let K^ — > oo as I? — > cx), with Kg kept 
constant. This guarantees that the fraction of time allocated to the 
guard intervals vanishes as Z) ^^ oo. For simplicity of notation, 
we shall assume in the remainder of the paper that Kg is an 
integer For fixed -q in (3), the decay property of g{t) expressed 
in ( 1 7) implies that one can choose Kg (independent of K) so that 
x{t) in (15) satisfies (3). This statement is proven in Appendix B. 

We next show formally that our construction results in a 
capacity lower bound. Fix an orthonormal WH set {g, T, F) 
satisfying Properties 1 and 2. Furthermore, let Qd be the set 
of probability measures on {x[fc,n]} that satisfy (16). Every 
probability measure in Qd induces a probability measure on x(t) 
in (15). We denote the corresponding set of probability measures 
on x{t) by Qwh(T^, D, rj, P). As just shown, x{t) satisfies (2)- 
(4). Hence, QwB.{W,D,r^,P) C Q{W,D,'q,P) [recall that 



Q{W, D, rj, P) is the set of all probability measures that satisfy 
(2)-(4)]. We can then lower-bound C in (13) as follows: 

1 



C 



lim 



sup I{yf{t);x{t)) 

Q(W,D,)),P) 



> lim 



sup 



n^oo D Q^^{W,D,r,,P) 



I{Vf{t)-x{t)). (18) 



Here, the inequality follows by restricting the supremization to 
the smaller set Qwh(VF, D, -q, P). 

C. The Discretized I/O Relation 

The second step in our approach is to project the output 
signal yf{t) [resulting from the transmission of x{t) in (15)] 
onto the signal set {gk,n{t)} to obtain 

y[k,n] = {yf,gk,n) 

(a) , > 

= {^gk.n,gk,n)x[k,n] 

^ / 

— h[k,n] 
K N 

+ ^ ^ {^gi,7n,gk,n)x[l,m] + {w,gk.n) 



l = -Km=--N ~ 
{l,m)^{k,n) =p[l,m,k,n] 

h[k, n]x[k, n] 

K N 



— w[k,n\ 



+ > y p[l,m,k,n]x[l,m]~\- 'w[k,n] (19) 

l = -Km=-N 
{l,m)^{k,n) 

for each time -frequency slot (fc, n), k — —K, —K + 1, . . . ,K, 
n — —N, —N + 1, . . . ,N. Here, (a) is a consequence of Prop- 
erty 1, which implies that the Fourier transform of gk.n{t) (with 
k^-K,-K+l,...,K,n = -N,-N-{'l,...,N)is strictly 
supported in the interval [—W/2, W/2]. We refer to the channel 
with 1/0 relation (19) as the discretized channel induced by the 
WH set (g, T, F). As we assumed that h^{t, t) in (1) is a zero- 
mean JPG random process in t and r, the random variables 
h[k, n] and p[l, m, k, n] are zero-mean JPG. Furthermore, the 
orthonormality of the WH set {g, T, F) implies that the w[k,n] 
in(19)arei.i.d.CA/'(0,l). 

For each time slot k E {—K, —K + 1, . . . , K}, we arrange 
the data symbols x[k, n], the output signal samples y[k, n], the 
channel coefficients h[k,n], and the noise samples w[k,n] in 
corresponding A^j. -dimensional vectors.^ For example, the iV^;- 
dimensional vector that contains the input symbols in the kth 
time slot is defined as 

x[fc] = [x[k, -N] x[k,-N + l] ... x[k, N]] ^ . 

The output vector y[fc], the channel vector h[k], and the noise 
vector w[fc] are defined analogously. To get a compact notation, 
we further stack K^ contiguous input, output, channel, and 
noise vectors, into corresponding iiTjjA^a; -dimensional vectors. 
For example, for the channel input this results in the K^N^.- 
dimensional vector 



x^[x^[-i^] x^[-i^ + l] ... ^^[K]\ 



(20) 



We assume tliat Kx is an odd integer. 



"Recall that Kj. 



■■2K + 1 and N^ 



2N + 1. 



Again, the stacked vectors y, h, and w are defined analogously. 
Finally, we arrange the self-interference terms p[l,m,k,n] in 

a Kr^Nx X Kj.N^ matrix P with entries 

_ {p[l - K,m-N,k- K,n- N], if (/, m) 7^ {k, n) 
1 0, otherwise 

for l,k ~ 0,1,..., Kx — 1 and m,n — 0, 1, ... , N^ — 1. 
With these definitions, we can now compactly express the I/O 
relation (19) as 



h0x + Px + w. 



(21) 



Let now Cd be the capacity of the discretized channel (21) 
[induced by the WH set {g, T, F)] with x subject to the average- 
power constraint (16). We can lower-bound the RHS of (18) by 
Cd as follows 



(a) 1 

C > lim 



D~^oo D Q^^^{W,D,n.P) 



sup 

1 



(fc) 

> lim — , „„^ 

- K^^^ {K, + 2Kg)T qI 



iiyfit);x{t)) 

sup/(y;x) 



C, 



d- 



(22) 



Here, in (a) we used (18), and (b) is a consequence of [20, Thm. 
1.4], which extends the data processing inequality to continuous- 
time signals. To summarize, we showed that the capacity of the 
discretized channel (21) induced by the WH set (g, T, F) is a 
lower bound on the capacity of the underlying continuous-time 
channel (1). 

D. Why Weyl-Heisenberg Sets? 

The choice of constraining x{t) to lie in the span of an 
orthonormal WH set according to (15) results in a signaling 
scheme that can be interpreted as PS-OFDM [24], where the 
data symbols x[k, n] are modulated onto a set of orthogonal 
signals indexed by discrete time (symbol index) k, and discrete 
frequency (subcarrier index) n. From this perspective, the self- 
interference term (the second term on the RHS of (19), which 
is made up of the off-diagonal terms in the I/O relation) can 
be interpreted as intersymbol and intercarrier interference. Dis- 
cretization through WH sets is sensible for the following two 
reasons. 

Stationarity: The structure of WH sets preserves the sta- 
tionarity of the channel in the discretization. More precisely, 
the channel gains h[k,n] in (19) inherit the two-dimensional 
stationarity property of the underlying continuous-time channel 
[see (8)], a fact that is crucial for the ensuing analysis. We prove 
this result in Appendix C, where we also establish properties of 
the statistics of p[/, to, fc, n] in (19) that will be needed in the 
remainder of the paper. 

Approximate diagonalization: The presence of the self- 
interference term in (19) makes the computation of Cd in (22) 
involved. A classic approach to eliminate self-interference is to 
discretize the channel by projecting the input and output signals 
onto the channel-operator singular functions [15], [16]. This 
choice is convenient, as it leads to a diagonal discretized I/O rela- 
tion, i.e., to countably many scalar, non-interacting I/O relations 



(see [2] for more details). Unfortunately, this approach is not 
viable in our setup, because in the LTV case the channel-operator 
singular functions are, in general, random and not known to 
transmitter and receiver (recall that we consider the noncoherent 
setting). Discretizing using deterministic orthonormal functions, 
as done in the previous section, yields self-interference, which we 
will need to take into account. This will be accomplished by treat- 
ing self-interference as additive noise, which will further lower- 
bound capacity. The main technical difficulty in this context 
arises from the self-interference term being signal-dependent. 
Moreover, as our capacity lower bound is obtained by treating 
self-interference as noise, ensuring that the power in the self- 
interference term is small (and, hence, that the discretized I/O 
relation is approximately diagonal) is crucial to get a good 
capacity lower bound. This can be accomplished by choosing 
the pulse g{t) to be well localized in time and frequency. In fact, 
it was shown in [13], [25], [14], [1] that the singular functions 
of random underspread operators can be well approximated by 
orthonormal WH sets generated by pulses that are well localized 
in time and frequency. 

E. A Lower Bound on the Capacity of the Discretized Channel 

We next derive a lower bound on Cd [and, hence, on C in (13)] 
by using a Gaussian input distribution, and by treating self- 
interference as (signal-dependent) noise. This lower bound — 
evaluated for an appropriately chosen WH set — will then be 
shown to be close (for all SNR values of practical interest) to the 
AWGN-channel capacity upper bound Cawgn in (14), whenever 
the channel is underspread according to Definition 1, thereby 
sandwiching the capacity of the underlying continuous-time 
channel tightly. 

Our first result is a lower bound on Cd, which we indicate as 
ii, that is explicit in the power spectral density C (6*) of the mul- 
tivariate stationary channel process {h[fc]} with autocorrelation 
function R[fc' - fc] = E [h[fc']h^[fc]] , where 



c{e)^ J2 mv 



-j2Tik 



\0\< 



1 



(23) 



fc— — 00 



We then show in Corollary 3, Section IV-F that Li can be further 
lower-bounded by an expression that is explicit in the channel 
parameters Ah and e introduced in Definition 1 . 

Theorem 2: Let {g, T, F) be an orthonormal WH set sat- 
isfying Properties 1 and 2 in Section IV-B and consider a 
Rayleigh-fading WSSUS channel (not necessarily underspread) 
with scattering function Ch(t, j^). For a given bandwidth W 
and a given SNR p = P/W, the capacity of the discretized 
channel (21) induced by (g, T, F) is lower-bounded according 
to Cd{p) > Li{p), where 



Liip) 




c{e) ]de 



(24) 



Here, 

h - CAf(0, 1) 

r[0,0] ^ IJcm{t,i^) \Ag{T,iy)fdTd,^ 

U T 

oo oo „ „ 

'^?- H II CM{T,iy)\AgiT-kT,iy-nF)\^dTdiy 

k=—oo n=— oo j^^ 
(fc,n)^(0,0) 

where Ag{T,h') denotes the ambiguity function of g{t) (see 
Appendix C) and C{9), defined in (23), denotes the matrix- 
valued power spectral density of the discretized channel induced 
by(5,T,F). 

Proof: See Appendix E. ■ 

F. A Lower Bound that is Explicit in the Channel Parameters 

Ag and e 

For the purposes of our analysis, it is convenient to further 
lower-bound Li to get an expression that is explicit in the channel 
parameters Ah and e introduced in Definition 1 . The resulting 
lower bound, presented in the next corollary, will allow us to 
assess how sensitive capacity is to whether Ch(t, i^) is compactly 
supported or not. 

Corollary 3: Let {g, T, F) be an orthonormal WH set satisfy- 
ing Properties 1 and 2 in Section IV-B and consider a Rayleigh- 
fading WSSUS channel (not necessarily underspread) in the 
set H(to, t'o, e) with scattering function C]s[{t, ly). For a given 
bandwidth W and a given SNR p — P/W, and under the 
technical condition Ah — 2i^qT < 1, the capacity of the 
discretized channel (21) induced by {g, T, F) is lower-bounded 
as C'dip) > L2ip), where 



L2{p) 




log 1 



TFp{l-e)mg\h\' 
1 + TFp{Mg + e) 



- inf 

0<a<l 



Ah log 1 



TFp 

aAra 



(1- Ah) log 1 



TFpe 
a{l - Ah) 



log(l + ^(M,. 



(25) 



Here, h - C7V(0, 1), nig = min \Ag{T, u)\ , and 

oo oo 

M,^ max^ J2 E \Ag{T-kT,,.-nF)f 

' k=—oo ri=— oo 

(fc,n)^(0,0) 

with V = [-To, To] X [-I/Q, ^'ol- 

Proof: See Appendix F. ■ 

The lower bound L2 in (25) depends on the seven quantities 
(p, g{t)^ T, F, To,i^Q, e) and is therefore difficult to analyze. We 
show next that if T and F are chosen so that j/qT = tqF, a 
condition often referred to as the grid matching rule [13, Eq. 
(2.75)], two of these seven quantities can be dropped without 
loss of generality. 



Lemma 4: Let [g, T, F) be an orthonormal WH set satisfying 
Properties 1 and 2 in Section IV-B. Then, for any /3 > 0, we 
have 

L2{p,g(t),T,F,To,iyo,e) 

^ L2(p,./^gm,^,f3F,j,P''o,^ 



In particular, assume that vqT ~ tqF and let (3 — y^T/F 

\/to/i^o and g{t) = y/]3g{l3t). Then, 

L2{p,9{t),T,F,To,i^o,^) 

= L2 (p, g{t), VtF, VtF, yAH/2, yAi^/2, e 



A r(*) 



L);>{p,g{t),TF,Au,e). 



(26) 



Proof: See Appendix G. ■ 

In (26), the superscript (s) indicates that the scattering func- 
tion is supported on a square (with sidelength a/Ah). In the re- 
mainder of the paper, for the sake of simplicity of exposition, we 
will choose T and F such that the grid matching rule i/qT = tqF 
is satisfied. Then, as a consequence of Lemma 4, we can (and 
will) only consider WH sets of the form {g, \TF, \TF) and 
WSSUS channels in the set -H(\/A^/2, VS^/2, e). 

is) 

The lower bound Lg in (26) can be tightened by maximizing 



it over all WH sets (g, vTF, vTF) satisfying Properties 1 
and 2 in Section IV-B. This maximization implicitly provides an 
information-theoretic criterion for choosing g{t) and TF. Unfor- 
tunately, an analytic maximization of Lj seems complicated as 



the dependency of nig and Mg on (g, 



/TF) is difficuh 



to characterize analytically. We shall therefore choose a specific 
g{t), detailed in the next section, and numerically maximize L^ 
as a function of TF. 

G. A Simple WH Set 

We next construct a family of WH sets (g, \/TF, VtF) that 
satisfy Properties 1 and 2 in Section IV-B, and has g{t) real- 
valued. Take 1 < TF < 2, let C = VTF, S ^ TF ~ 1, 
and G(/) = F{g(t)}. We choose G(/) as the (positive) square 
root of a raised-cosine pulse: 



G{f) = { ^la + sif)), 



if 
if 



l/l < '-r' 



2C 



otherwise 



2C 



(27) 



where 5(/) ^ cos f |/| - 



l-<5 

2C 



. As (1 + S)/{2C) - C/2, 



the function G{f) is supported on an interval of length C = 
VTF. Furthermore, G{f) has unit norm, is real-valued and even, 
and satisfies 

00 
E G{f^n/C}G{f-n/C-kC) = CS[k]. 

n— — 00 

By [26, Thm. 8.7.2], we can therefore conclude that the WH 

set {g{t), 1/VtF, 1/VtF) is a tight WH frame for C^{R), 
and, by duality [27]-[29], the WH set {g{t),VTF, VTF) is 
orthonormal. Finally, it can be shown that g{t) = 0{l/t'^) 
whenever TF > 1. 
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Fig. 3. Lower bounds Lj normalized with respect to the upper bound Cawgn- 
The bounds are computed for WH sets based on the root-raised-cosine pulse (27), 
for different values of the grid-parameter product TF. Ae = 10^* in (a) and 
Ah = 10-'' in (b). In both cases, e = IQ-^. 



V. FiNiTE-SNR Analysis of the Lower Bound L^"-* 

(s) 

We now study the behavior of the lower bound Lj i" (26) 
evaluated for the WH set constructed in the previous section, 
under the assumption that the underlying channel is underspread 
according to Definition 1, i.e., Ae ^ 1 and e <C 1. Specifically, 

(s) 

we compare L2 to the upper bound Cawgn in (14). To simplify 
the comparison, we assume throughout this section that W ^ 
vq (a reasonable assumption for most wireless communication 
systems of practical interest). Furthermore, in (3) we take 77 ^ 1. 
Under these assumptions, we have 



CAwcN{p)-W[\ogil + {l~e)p) 



^P 



(28) 



A. Trade-off between Self-Interference and Signal-Space Dimen- 
sions 

In Fig. 3, we plot ^2 /Cawgn for Ah — 10^'' and for Ah ~ 
10^^. In both cases, we take e ~ 10~^. The different curves 
correspond to different values of TF. We observe that the choice 
TF = 1 is highly suboptimal. The reason for this suboptimality 
is the poor time-frequency localization of g(t) this choice entails. 
In fact, when TF = 1, the pulse git) reduces to a (sin t)/t 
function, which has poor time localization. This, in turn, yields 
an ambiguity function Ag{T, v) that is poorly localized in r, and, 
hence to a small value for mg and a large value for Mg, i.e.. 



Fig. 4. Trade-off between the product TF, and the signal-to-interference ratio 
nig /Mg for the root-raised-cosine WH set constructed in Section IV-G. 



to small signal-to-interference ratio (SIR) nig/Mg; this leads 
to a loose lower bound L2 (recall that Lj was obtained by 
treating self-interference as noise). A value of TF slightly larger 
than 1 results in a significant improvement in the SIR nig/Mg 
(see Fig. 4), which is caused by the improved time localization 
of g{t). This, in turn, yields an improved lower bound Lj for 
all SNR values of practical interest, as shown in Fig. 3. A further 
increase of the product TF seems to be detrimental for all but 
very high SNR values, where the ratio ^2 V^awgn is much 
smaller than 1 anyways. The reason underlying this behavior 

(s) 

is as follows: in the regime where Lj is close to Cawgn, the 
first term on the RHS of (25) dominates the other terms. But 
in this regime, the first term on the RHS of (25) is essentially 
linear^ in W/{TF), which can be interpreted as the number 
of signal-space dimensions available for communication. The 
loss of signal-space dimensions incurred by choosing TF much 
larger than 1 quickly outweighs the SIR gain resulting from 
improved time-frequency localization. Our numerical results 
suggest that a value of TF slightly larger than 1 optimally trades 
signal-space dimensions for SIR maximization. We hasten to 
add that this trade-off is a consequence of self-interference being 
treated as (signal-dependent) noise in deriving our lower bound. 



B. Sensitivity of Capacity to the Channel Parameters Ah and e 

The results presented in Fig. 3 suggest that, for TF — 1.02, 
the lower bound ij is close to the AWGN-channel capacity 
upper bound Cawgn over a large range of SNR values. To further 
quantify this statement, we identify the SNR interval [pmin, Pmax] 
over which 

4'^(p)> 0.75 Cawgn (p). (29) 

The corresponding interval end points pmin and pmax, as a func- 
tion of Ah and e, can easily be obtained numerically and are 
plotted in Figs. 5 and 6, respectively, for TF = 1.02. For the WH 
set and WSSUS underspread channels considered in this section, 
we have p^i^ G [-25dB,-7dB] and p„ax e [30dB,68dB]. 
Hence, the interval {pmin, Pmax) covers all SNR values of practi- 



10 




Fig. 5. Minimum SNR value «„,!„ for wliicli (29) liolds, as a function of An 

(s) 

and e. Tlie lower bound Lj is evaluated for a WH set based on the root-raised- 
cosine pulse (27); furthermore, TF = 1.02. 




Fig. 6. Maximum SNR value pma,K for which (29) holds, as a function of An 
and e. The lower bound Lj is evaluated for a WH set based on the root-raised- 
cosine pulse (27); furthermore, TF = 1.02. 



cal interest. An analytic characterization of pmi,, and pmax seems 
difficult. Insights on how these two quantities are related to the 
channel parameters An and e can be obtained by the following 
"back-of-the-envelope" analysis of lI"' (for TF == 1.02). We 

(s) 

first approximate Lj by replacing rrig and Mg (whose depen- 
dency on Ah is difficult to characterize analytically) with simpler 
expressions that are accurate when Ag ^ 1. Then, we determine 
the SNR values for which the resulting approximate lower bound 
is close to (28). We start by noting that, when Ajj ^ 1, we can 
approximate nig by its first-order Taylor-series expansion around 
Ah = 0. This yields 



mill ^\Ag{T,iy) 
(T.u)ev 

m Am 



where T) = [- 



and c„ 



1 



Fg) with 



(30) 



X [-VAh/2, 



rp2 A 



t'\g{t)\'dt, i^2A f\Gif)\'df. 



To get (30), we used the Taylor-series expansion of |^g(T, v)] 
reported in [30, Sec. 6]. Similarly, for Ah <C 1 we can approxi- 

■'Recall that p = P/W. 



mate Mg as follows: 



Mg = 






oo 



oo n— — oo 
(fe,ri)5^(0,0) 



\Ag{T-k^/TF,v 



where 



CjI/Ah 



Cm 



(31) 



oo 



fc= — oo n= — oo 
(fc,n)5^(0,0) 



[|afe,„l' + IV«l']/4 



with flfe^n and bt.n being the first partial derivatives of Ag{T, v) 
(with respect to v and r, respectively) calculated at the points 

{-kVTF,-nVTF): 



a-k', 



j2^ / tg{t)g{t 



^]27,nVTFt 



dt 



J27T / /G(/ - nVfF)G{f)e-^^^''^^fdf. 



f 
Here, (3 1 ) is obtained by performing a Taylor-series expansion of 



Ag{T-kVTF,iy~nVTF) around the point (r, i/) = (0,0) for 
all k and n, and by using that g{t) is real and even. For our choice 
of TF = 1.02 we have c„ w 25.87 and cm ~ 0.77. Hence, (30) 
and (31) suggest that when Ah <C 1, we can approximate nig 
by 1 and Mg by Ah- On the basis of these two approximations, 
which are in good agreement with the numerical results reported 
in Fig. 4, and the assumption that e ^ 1 and TF = 1.02 w 1, 



we can approximate the lower bound ij 
satisfying p(Ah + e) <C 1 as follows 



(^) 



for all SNR values 



L^^\p)^w\Ef, 



\og{l + p\h\- 




(32) 



The RHS of (32) is close to the AWGN-channel capacity upper 
bound (apart from the Jensen penalty in the first term) for all 
SNR values that satisfy p ^ V^m- In fact, when p :^ \/Ah 
(and Ah ^ 1), the second term on the RHS of (32) can be 
approximated as 



^ Ah log 1 + 




Ah log p 
<logp 

which implies that, when p ^ \/Ah (and Ah ^ 1), the first 
term on the RHS of (32) dominates the second term on the RHS 
of (32). 

We can therefore summarize our findings in the following rule 
of thumb: the capacity of a Rayleigh-fading WSSUS underspread 
channel with scattering function Ch(t, i^) and parameters Ah 
and e in Definition 1, is close to Cawgn for all p that satisfy 
v^Ah ^ p <C 1/(Ah + e), independently of whether Ch(t, i') 
is compactly supported or not, and independently of its shape. 



In particular, this implies that capacity essentially grows loga- 
rithmically with SNR up to SNR values p < 1/(Ah + e). We 
conclude by noting that the condition \/Ah <C p ^ l/(Ae + e) 
holds for all channels and SNR values of practical interest. 

VI. Conclusions 

We studied the noncoherent capacity of continuous-time 
Rayleigh-fading channels that satisfy the WSSUS and the under- 
spread assumptions. Our main result is a capacity lower bound 
obtained by (i) discretizing the continuous-time I/O relation 
and (ii) treating the (signal-dependent) self-interference term 
in the resulting discretized I/O relation as noise. Discretization 
is performed by constraining the input signal to lie in the span 
of an orthonormal WH set and by projecting the output signal 
onto the same orthonormal set. The resulting lower bound was 
shown to be close to the AWGN-channel capacity upper bound 
Cawgn for all SNR values of practical interest, as long as the 
underlying channel is underspread according to Definition 1 . In 
particular, this result implies that — for all SNR values typically 
encountered in real-world systems — the capacity of Rayleigh- 
fading underspread WSSUS channels is not sensitive to whether 
the channel scattering function is compactly supported or not. 
It also shows that — for all SNR values of practical interest — 
lack of channel knowledge at the receiver has little impact on 
the capacity of this class of channels. From a practical point of 
view, the underspread assumption is not restrictive as the fading 
channels commonly encountered in wireless communications 
are, in fact, highly underspread. 

On the basis of our capacity lower bound, we derived 
an information-theoretic criterion for the design of capacity- 
approaching WH sets to be used in PS-OFDM schemes. This 
criterion is more fundamental than criteria based on SIR maxi- 
mization (see [31] and references therein), because it sheds light 
on the trade-off between self-interference reduction and maxi- 
mization of the number of signal-space dimensions available for 
communication. Unfortunately, the corresponding optimization 
problem is hard to solve, analytically as well as numerically. It 
turns out, however, that the simple choice of taking g{t) to be a 
root-raised-cosine pulse and letting the grid-parameter product 
TF be close to 1 (but strictly larger than 1) yields a lower bound 
that is close to Cawgn for all SNR values of practical interest. 
In particular, this result suggests that — when self-interference 
is treated as (signal -dependent) noise — the maximization of the 
number of signal-space dimensions available for communication 
should be privileged over SIR maximization. 

An interesting open problem, the solution of which would 
strengthen our results, is to compute an upper bound on the 
capacity of (1) by assuming perfect channel state information 
at the receiver. The main difficulty here lies in dealing with 
self-interference. In particular, we expect that nonstandard tools 
from large random matrix theory will be needed for this analysis. 
Recent results along these lines, for a specific channel model, 
can be found in [32]. 

Appendix A 
AWGN Capacity Upper Bound 

Let H G H{TQ,iyo,e).To establish that C < Cawgn, where 
Cawgn is defined in (14), we start by upper-bounding the mutual 



information on the RHS of (13) as follows: 

I{yf{t}:x{t))<I{yf{t);rf{t)). 



(33) 



Here, rf{t) — (V>w+2i^o'''){t), and the inequality follows by 
noting that x{t) and yf{t) are conditionally independent given 
rf (i) and by using the data-processing inequality for continuous- 
time random signals [20, Thm. 1.4]. If we now substitute (33) 
into (13), we obtain 



C < Hm — 

D^oo D 



sup I{yf{t)-rf{t)). (34) 

Q(W,D,ri,P) 



The mutual information in (34) is between the input and the out- 
put of a continuous-time band-limited AWGN channel. Hence, 
we can establish an upper bound on the RHS of (34) by in- 
voking [15, Thm. 2], provided that an inequality, in the spirit 
of (3), on the energy of the restriction of Vf (t) to a certain time 
interval can be established. More specifically, we shall show 
next that the energy of the restriction of rj (t) to the interval 
[-D/2 - To, D/2 + To], i.e., the energy of (Tc+2rorj)(i), is 
bounded from below by (1 — 77)(1 — e)E [||a:;(f)|p]. Let 



J-r,'^) 



it) 



W+2i/o 



[x{t 



Tje 



j2TTtu\ 



Using 



(Tz5+2.„r/)(i) 



Su{t, y)xf'''\t)dTdv, if \t\ < D/2 + To 



otherwise 



we get 

E[\\{TD+2rorfKtW] 



(a) 



Ch(t, u) E 



(b) 
> 



1/ T 

i^O To 



D/2+T0 

/ 


Jp^\t) 


2 
dt 


-D/2-T0 







drdv 



Ch(t, v) E 



D/2+ro 



x(:-)(t) 



dt 



-D/2-Ta 



drdv (35) 



where (a) follows from the WSSUS property of IH [see (6)], and 
(b) follows from the non-negativity of the integrand. Because 
x{t) is subject to the bandwidth constraint (2) and to the time- 
concentration constraint (3), we have that, for every (t, v) e 

[-To, To] X [-1^0, t'o], 



E 



D/2+T0 



S-r,^) 



xy'^>(t) 



dt 



>{l-v)E[\\x{t)f]. (36) 



-D/2-T0 

Substituting (36) into (35), we get 

E[\\{TD+2r„rf){t)f] 

vq To 

> (l-r7)E[||x(t)||2] I j Cn{T,v)dTdiy 
>(l-r7)(l-e)E[||x(t)||2] (37) 
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where the last step follows from Definition 1 . We now observe 
that 

E[\\rfit)r]<E[\\rit)r] 

Ch(t, u) E [|la;(t - T)e^^'"''f] drdu 



Cn{T,v)E[\\x{t)f]dTdv 



= E[\\x{t)\\ 



(38) 



for /c, ; = 0, 1, . . . , isTa; - 1 and n, m = 0, 1, . . . , A^a; - 1. Note 
that D is Hermitian, by construction. We have that 

E[||(I-Tz3)x(t)||2] =E[x^Dx] <A™,,{D}E[!lx!P]. 

Here, the first equality follows by definition, and the inequal- 
ity follows by application of the Rayleigh-Ritz theorem [33, 
Thm. 4.2.2].'^ We next use the Gersgorin disc theorem [33, 
Cor 6.1.5] to derive an upper bound on A^axlD} that is explicit 
in the entries of D: 



Here, the last step follows from the normalization (7). The 
inequality (38), combined with (37), yields the following time- 
concentration inequality for Vf (t) [cf. (3)] 

E[\\TD+2r„rf{t)f] >(l-r7)(l-e)E[||r/(i)||2]. (39) 

To obtain the desired upper bound (14), we now note that 
every probability measure on x{t) in the set Q{W,D,ri,P) 
induces a probability measure on rf{t) (ttirough the map rf{t) = 
(]Bi4'_l_2i/o ^ ^) (0) that satisfies the following constraints [cf. (2)- 
(4)]: 
i) the bandwidth of r/(i) is no larger than {W + 2i^o), 
ii) E [|lr/(f)||2] < DP, which follows from (38) and (4), and 
iii) (39) holds. 

Let Q be the set of all probability measures on r/ (t) satisfying 
i)-iii). Note that the set of probability measures on r/(t) induced 
by probability measures on x{t) in Q{W, D, 77, P) through the 
map rf (t) = {V>w+2i^o ^ ^) (0 i^ contained in Q, as shown 
above. This property can be used to upper-bound the RHS of (34) 
according to 



Amax{D} < max 

kel-K,K],nel-N,N] 



K 



N 



^ ^ \d[k,n,l,m]\ 



l=-Km=-N 



(41) 



Each term on the RHS of (41) can be bounded as follows 



\d[k,nj,m]\^ I 9k,n{i)9i,ni{i)dt 

\t\>D/2 

< / \9k,nit)9lmit)\d't 

\t\>D/2 

= J \9{t-kT)g*{t-lT)\dt. 

\t\>D/2 

Recall that D/2 = {K + Kg + 1/2)T, by construction. As, by 
assumption, g{t) is even and satisfies g(t) — C'(l/t^+^), there 
exist constants 7 > 0, and to > such that \g{t)\ < 7/ |t| ^ 
for \t\ > to- Hence, if we choose Kg such that KgT > to, we 
get^ 



lini — 
D-i-oo D 



sup I{yf{t)]rf{t)) 

Q(W,D,ri,P) 

< lim 



\d[k, n, I, m\\ 



1 



sup/(y/(t);r/(t)). 
S 



< 



1 



1 



A direct application of [15, Thm. 2] yields (14). 

Appendix B 
The Input Signal (15) Satisfies (3) 

We show that for every orthonormal WH set satisfying Prop- 
erties 1 and 2 in Section IV-B and for every ry > 0, and D > Q, 
one can find a Kg > such that the corresponding x{t) in (15) 
(with K chosen as specified in Section IV-B3) satisfies (3). To 
this end, it will turn out convenient to reformulate (3) as follows: 



\t\>{K+Kg + l/2)T 

00 



i-ZcT^^ \t-lT\ 



1 



1 



+ 7 



\t - kT\^+^ \t - IT\ 

(K+Kg + l/2)T 

-(K+Kg + l/2)T 
2 



1 



1+M 



1 



1+M 



dt 



dt 



(a) . 

< r 



\t-kT\^+^ \t-lT\^+^ 
1 1 



dt 



E[\\{l-TD)x{t)f] <TiE[\\x{t)\\ 



(40) 



\t-KT\'^^ \t-lT\ 

(K+Kg + l/2)T 

-(K+Kg + l/2)T 



where I denotes the identity operator. Let x be the vector of 
dimension K^N^ obtained by stacking the data symbols x[k,n] 
as in (20). Furthermore, let 



+ Y 



1 



l+A* 



1 



dt 



(b) 2 

= 7 



d[k,n,l,m]= I gk,ni't)9lmi't)dt 

\t\>D/2 

and define D to be the square matrix of dimension K^N^ x 
K^N^ with entries 



\t + KT\^+''\t-lT\'+^ 
1 1 



dt 



(K^ + l/2)T 



tl'^f" \t - {I - K)T\ 



l+p 



dt 



[D 



n+kN^,m+lN^ 



^d[k- K,fi~ N, I- K,m-N] 



With slight abuse of notation, we used |1-||, a symbol which we reserved for 
the norm in £^(R), to denote the Euclidean norm in a finite-dimensional vector 
space. 

'if t() > D/2, we let the guard-interval cover the whole transmission time 
[-D/2, D/2]. In this case (40) is trivially satisfied. 



-{K, + 1/2)T 



+ Y 



ii+p 



|i|^^ \t-{l + K)T\ 



i+Ai 



dt. (42) 



capacity lower bound in Theorem 2. The first property concerns 
the autocorrelation function of /i[fc, n]. Let the cross-ambiguity 
function of two signals f{t) and g{t) be defined as [34] 



Here, (a) follows by replacing A: by i^ in the first term of the sum 
and fc by — i\r in the second term of the sum; these substitutions 
lead to an upper bound; (b) follows by a simple change of 
variables. Note now that, for t > KgT, we have 

K 2K 

V - =V — 



AfAr,'^)^ //(i)5*(t-r)e-^"2-*di 



l=-K 



1=0 
2K 

oo 



1 



[{Kg + l)T]^+^ 



< 






(46) 



and let the ambiguity function of g{t) be defined as Ag(T, i^) = 
Ag g(r, ly)}'^ The autocorrelation function of h[k, n] turns out 
to be explicit in the ambiguity function of ^(i), as the following 
calculation reveals: 

E[h[k,n]h*[l,m]] 

= E[{Mgk.n, gk.n){^gi,m, 9l,m)*] 
(a) 



1=1 



~ 7 < OO 



(43) 



(b) 



where in the last step we used that /i > and, hence, the series 
converges. Similarly, for t < —KgT, we have 

K 2K 

^^\t-il + K)T\'+^ i^^\t~lT\' 



Ch(t, i^)A* (t, i')Ag, ^ (t, v)dTdv 



C7h(t, v) \Ag{T, V)\^ e^-^^[ik-l)T^-{n-rr.)Fr]^^^^ 



r[k — l,n — m] 



(47) 



l=-K 



1=0 
2K 

^E 

1=0 



1 



[{Kg+l)TY^ 



(44) 



Here, (a) follows from Property 6 in Appendix D and because EI 
is WSSUS [see (6)], while (b) follows from Property 5 in Ap- 
pendix D [see in particular (52)]. As a consequence of (47), we 
have that {/i[A:, n]} is stationary both in discrete time k and in 
discrete frequency n. The corresponding power spectral density 
function is given by 



Inserting (42) into (41) and using (43) and (44), we get 

K N 

E E \d[k,n,l,m]\ 



oo oo 



l=-Km=-N 
K N 

^E E 



c{ip,6)^ E E Kfc,«]e-^'"'''""^\ |^|,|0|<l/2. 

(48) 



fc— — oo n— — oo 



l = ^Km=-N 



1 



1 



(K, + l/2)T 
{K, + l/2)T 



W 



1 



|t|'+^ \t-{l-K)T\ 
dt 



i+M 



dt 



The Fourier transform relation (48) together with the Poisson 
summation formula allow us to relate c((p, 6) to the channel 
scattering function Cj^{t, v) as follows 



2,/ 



< 2(2iV + 1)7''7' 



\t\^+^\t-{l + K)T\^+'' 

OO 

1 



OO oo 



c(^,0)= Y. J2 e--''2-('^«-"^) 

/c— — OO n— — oo 

X // Ch(t, ly) \AgiT, i^f e^^^'^^'^-^^-UTdi^ 



ti+f^ 



dt. 



(K, + 1/2)T 

To summarize, we have the following upper bound on the RHS 

of (41): 



OO OO 



hH He. 



TF 



fc— — oo n— — oo 



if — n 9 — k 



F 



T 



OO 

A„ax{D}<2(27V+l)7V / ^ 



dt. 



(45) 



Ar, 



if — n 9 — k 



(49) 



{K, + 1/2)T 

The RHS of (45) can be made arbitrarily small by choosing Kg 
sufficiently large. In other words, we can find a finite Kg for 
which the RHS of (45) is smaller than 77. This concludes the 
proof. 

Appendix C 
Statistical Properties of the Channel Coefficients 

IN (19) 

We establish basic properties of the statistics of h[k, n] 
and p[l,m,k,n] in (19) that will be needed in the proof of the 



F ' T 
Another property we shall often use is 

1/2 1/2 
r[0,0] = / / c{(p,9)d(pd9 

-1/2-1/2 

CM{T,v)\Ag{T,u)\ drdv 

V T 

< 1 

'"Basic results on the ambiguity function that will be needed in our analysis 
are reviewed in Appendix D. 
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where the last step follows from Property 3 in Appendix D, 
from the assumption that g{t) has unit norm, and from the 
normalization (7). 

A characterization of the autocorrelation function 
of p[l,m,k,n\ is possible, but not particularly insightful. 
For our purposes, it will be sufficient to study the variance of 
p[l, m, k, n]. As p[l, m, k, n] has zero mean (see Section IV-C), 
its variance is given by 



Here, (a) follows from the change of variables i' = t — a. As a 
direct consequence of (51), we have that 



A 



9(c..ii) 



{T,u)^Ag{T,u)e-'^<-"--f'\ 



(52) 



Property 6: Let S'h(t, v) be the delay-Doppler spreading 
function of the channel H. Then, for g{t) e £^(M), and 



E 



\P[1, 



. f£. 



1 9 J) 



Sm{t, v)g(t - T)e=^^'^r{t)dTdvdt 



E 



(a) 



{b) 



\{^9l,m,9k,n)Y 



Ch(t,i/) |Ag,_„,g,_„(T,j^)| drdv 



Ch(t, u) \Ag{T + {l- k)T, v+{m- n)F)\^ drdv 



Sn{T,iy) 



f{t)g*{t-T)e-^^^'-'dt 



drdv 



j j Sn{T,v)A*jg{T,v)dTdv. 



u r 
_2r 



k,m ~ n] 



(50) 



where in (a) we used Property 6 in Appendix D together with 
the WSSUS property of H, and (b) follows from Property 5 
in Appendix D. 

Appendix D 
Properties of the Ambiguity Function 

We summarize properties of the (cross-) ambiguity function 
defined in (46) that are needed for our analysis. 

Property 3: For every function g{t) e £^(M), the ambigu- 
ity surface \Ag{T, i^)| attains its maximum at the origin, i.e., 

lAgiT,!^)^ < [Ag{0,0)Y == ||g(t)||4, for all T and i^. This 
property, as shown in [18, Lem. 4.2.1], follows directly from the 
Cauchy-Schwarz inequality. 

Property 4: Let g{t) G C^iR) and e(t) = y/j3g{(3t). Then 

Ae{T,i^)^ f e{t)e*{t-T)e-^^'"'*dt 



Appendix E 
Proof of Theorem 2 

We obtain a lower bound on Cd in (22) by evaluating the 
mutual information /(y;x) for a specific input distribution. 
In particular, we take x[k, n] to be i.i.d. JPG with zero mean 
and variance TFp for all k,n, so that the average-power con- 
straint (16) is satisfied. The corresponding input vector x is 
independent of h, P, and w. We use the chain rule for mutual 
information and the fact that mutual information is nonnegative 
to obtain the following standard lower bound: 



/(y;x) = /(y;x,h)-/(y;h|x) 

= /(y;h)+/(y;x|h)-/(y;h|x) 
>/(y;x|h)-/(y;h|x). 



(53) 



/? 9m9*m-r))e-^''"'*dt 



(a) 



g{z)g*{z~l3T)e-^^'"'^^l^dz 



= Ag[fiT, 



where (a) follows from the change of variables z — jit. 

Property 5: The cross-ambiguity function between the two 
time- and frequency-shifted versions 9[a.fi) (t) — 9{t^ a)e^^^^* 
and g(a',f3'){t) = 9{t - ayj^""^'* of g{t) e C^{M.) is given by 

g{t - a)e^^''^*g*{t - a' - T)e-^^''^''-''"-^e^^'^'"''dt 



The first term on the RHS of (53) can be interpreted as a 
"coherent" mutual information term (i.e., the mutual informa- 
tion between x and y under perfect knowledge of the channel 
realization at the receiver), while the second term can be inter- 
preted as quantifying the rate penalty due to the lack of channel 
knowledge [1]. 

1) The "Coherent" Term: The first term can be further lower- 
bounded as follows 

/(y;x|h) 

- h(x|h)-h(x|h,y) 

(a) 



(&) 



h(x)-h(x|h,y) 

K N 



ik.n) 
prec 



(c) 



^ ^ [h(:.[fc,n]|x, 

=-Kn=-N 

-h(^x[fc,n]|h,y,x(f,J' 

K N 

X! X! [h(x[fc,n])-h(^a;[fc,n] |h,y,X| 



{k,n) 
prec 



(^) gj27r/3'rg-j27r(i>+/3'-/3)a 



k=-Kn=-N 
K N 



X / g{t')g*{t' - {a' -a)- T)e-^^<''+f''-f'^'' dt' 



Ag{T + a — a,v + P — /?) 



(51) 



> ZZ zZ Hx[k,n])-h{x[k,n]\h[k,n],y[k,n\)\ 

k=-Kn=~N 
K N 

= ^ ^ I{y[k,n]]x[k,n]\h[k,n]). 

k=-Kn=-N 



Here, (a) follows because x and h are independent; (b) is a 
consequence of the chain rule for differential entropy [xprec* 
denotes the vector containing all entries of x up to and including 
the one before a; [fc, n]]. Next, (c) holds because x has i.i.d. entries, 
and (d) follows because conditioning reduces entropy. 

We next seek a lower bound on /(y[fc, n]; x[k,n] \h[k,n]) that 
does not depend on [fc, n]. Let w[k, n] be the sum of the self- 
interference and noise terms in y[k, n] [see (19)], i.e.. 



If we now substitute (56) into (54), we obtain 



I{y[k, n\;x[k, n] \ h[k, n]) > Eh 
and, consequently. 



log 1 



iO,0]TFp\h\' 
1 + TFpaj 



/(y;x|h)>i^,iV,E,, 



log 1 



r[0,0]TFp\h(' 
l + TFpaj 



(58) 



K 



N 



w[fc,ri]= y y p\l,m,k^n]x[l,rn] + w[k,n]. 

l=-Km=-N 

Furthermore, let wg [fc, n] be a proper Gaussian random variable 
that has the same variance as w[fc,n]. It follows from [35, 
Lem. II. 2] that /(j/[fc, n]; a;[fc, n] \ h[k, n]) does not increase if 
we replace w[k,n] by wcik, n]. Hence, 

I{y[k, n\] x[k, n] \ h[k, n]) 

= I{h[k, n]x[k, n] + w[k, n]; x[k, n] \ h[k, n]) 
> I{h[k,n]x[k,n] + wcik, n];x[k,n] \ h[k,n]) 



2) The Penalty Term: We next seek an upper bound on 
the penalty term /(y; h | x) in (53). The main difficulty lies 
in the self-interference term being signal-dependent. Our ap- 
proach is to split y into a self-interference-free part and a 
self-interference-only part. Specifically, let wi ^ CA/^(0,q:I) 
and W2 -- C7V(0, (1 - a)T), where < a < 1, be two K^N^- 
dimensional independent JPG vectors." Then, 






log 1 + TFp- 



|/i[fc, 



E 



\wG[k,n]Y 



(b) 



Eh 



log 1 



iO, 0]TFp\hf 



E 



\wG[k,n]\ 



(54) 



where (a) follows because x[k, n] ^ CAf{0, TFp), and (b) fol- 
lows because h[k, n] ^ CM{0, r[0, 0]) [see (47)], so that we can 
replace h[k, n] by r[Q, 0]h, where h ^ CAf(0, 1). As the input 
symbols x[fc, n] are independent, and as E \p[l,m,k,n]\ 
ap[l — k,m — n] [see (50)], we have that 



E 



\wG[k,n] 



= E 



|w[fc, n\\' 



K N 

= l + TFp Y, E al[l-k,m-n]. 

l=-Km=-N 
(l.m)^{k,n) 

(55) 

The nonnegativity of <Tp[k, n] allows us to upper-bound (55) as 
follows 

E 



/^ — oo m— — oo 
(/,m)/(/c,n) 

oo oo 



= l + TFpJ2 E '^p['' 



m\ 



1 — — 00 m— — oo 



= l + TFpaj 



where we set 



'^l 



= E E -p[ 

/ — — oo m— — co 



(56) 



(57) 



h0x 
h0x 



Px + w 

wi+Px- 



W2 . 



By the data-processing inequality [19, Thm. 2.8.1] and the chain 
rule for mutual information, we have that 

/(y;h|x)</(yi,y2;h|x) 

==/(yi;h|x) + /(y2;h|x,yi). (59) 

As h is JPG, the first term on the RHS of (59) can be bounded 
as follows: 



/(yi;h|x) 

= /(h0x + viri;h|x) 

log det ( I + - diag{x} E [hh^] diag{x^ } 

log dot ( I + - diag{x^} diag{x} E [hh^] 

?^E[hh« 



= E, 



^"^E. 



(b) 

< log dot I 



(60) 



Here, (a) follows from the identity det(l + AB^) = 
det(l + B^A) for any pair of matrices A and B of appropriate 
dimensions [33, Thm. 1.3.20] and (b) is a consequence of 
Jensen's inequality. 

For the second term on the RHS of (59) we note that 



-^(y2;h|x,yi) = h(y2 |x,yi 

(a) 



h(y2|x,yi,h) 
h(y2|x,yi)-h(y2|x,h) 



< h(y2|x)-h(y2|x,h,P) 
^=^(y2|x)-h(y2|x,P) 

= /(y2;P|x). 

Here, (a) holds because yi and y2 are conditionally independent 
given X and h, in (b) we used twice that conditioning reduces 
entropy, and (c) follows because y2 and h are conditionally 
independent given P. 

' ' The role of a will become clear later. 



16 



Let K(x) = Ep[Pxx^P^] be the K^N^ x K^N^ con- 
ditional covariance matrix of the vector Px given x. We next 
upper-bound /(y2; P | x) as follows: 



Furthermore, as the bound holds for all a e (0, 1), we can tighten 
it according to 



/(y2;P|x) 

= /(Px + W2;P|x) 



C{p)>^¥.u 



log 1 



r[Q,Q]TFp\hY 
l + TFpaj 



log det I I 



^='e. 



fc=0 "=0 



1 



1 — a 



K(x) 



log( 1 + YZTa [^W](ft+fcw.,ft+fcwj 



- inf I lim ^ — logdctf I+!^E[hh^] 

o<a<i\K.^^c<, {K^ + 2Kg)T V a ^ ^ 



+ -^log l + -^a2 
J V 1 — a 



(63) 



^ E Ei°<i + r3^E^ 

fc=0 "=0 



[K(x)] 



(n+kN^,h+kN^) 



where (a) follows because, given x, the vector Px is JPG, in (b) 
we used Hadamard's inequality, and (c) follows from Jensen's 
inequality. As the entries of x are i.i.d. with zero mean, we have 
that 



By direct application of [36, Thm. 3.4], an extension of Szego's 
theorem (on the asymptotic eigenvalue distribution of Toeplitz 
matrices) to two-level Toeplitz matrices, we obtain 



lim T^T , ,, ,„ logdet( IH ^ E [hh^l 



K^^oo (K^ + 2Kg)T 



Ex [K(x): 



{n+kN^,n+kN^) 



1/2 

^ I logdct('l+^C(0)')d0. 

-1/2 



= E, 



E, 



,[[Pxx^P«] 

K N 



{h+kN^,h+kN^) 



Substituting this expression into (63) and noting that [see (50)] 



= TFp Y^ Y^ al[l-k + K,m-ii + N] 

l = ~Km=-N 
(l,m)^{k-K,n-N) 

<TFpaj 
where aj was defined in (57). Hence, 

/(y2;P|x)<i^,7V,log('l + ^^a|y (61) 

If we now substitute (60) and (61) into (59), we obtain 



oo oo 

-? = E E -Uk, 

/c — — oo n— — oo 
(fc,n)/(0,0) 

oo oo 



k— — oo n— — oo 
(fc,n)/(0,0) 



completes the proof. 



E E Cu{T,v)\Ag{T-kT,v-nFfdTdv 



(64) 



/(y; h I x) < log dct ( I + ^^ E [hh^] 



TFp 



-i^,iV,logl 



1 — a 



(62) 



3) Putting the Pieces Together: We substitute (58) and (62) 
into (53) and then (53) into (22) to get the following lower bound 
on capacity: 



Appendix F 
Proof of Corollary 3 

To prove the corollary we further bound each term in (24) 
separately. 

a) The "log det" term: We start with an upper bound on 
the "log det" term on the RHS of (24). The matrix C{e) is 
Toeplitz [see (23)]. Hence, the entries on the main diagonal 
of C{9) are all equal. Let co{9) denote one such entry; then 



C{p)>^^H 



lim 



logll 
1 



K^^oo {K^ + 2Kg)T 



r[0,0]rFp|/ir 
l + TFpaj 

logdet(l+^^E[hh^] 



Nx , / TFp . 
-flog 1 + -^^| 
1 \ 1 — a 



Co(e)= Y. r[k,Q]e~i^^^'' 

k— — oo 
1/2 
(b) 



c{^,9)d^, |0|<l/2. (65) 



-1/2 



Here, (a) follows from (23) and (47); (b) follows from (48) and 
by applying the Poisson summation formula. By Hadamard's 
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inequality, we can upper-bound the "log det" term on the RHS 
of (24) as follows: 



1 



1/2 



oo oo 



1 



1/2 






1 \ a 



< 



/ / Ch(t, i^)dr(ij^ < e 



(69) 



-1/2 



\u\>iyo -r 



N^ 



1/2 



<:^ y log(l + ^co(0)jd0. (66) 

-1/2 

Let B^ {6 : \e\ < i^^T} and B ^ {6 : i^qT < \6\ < 1/2}. 
We next use that h'oT < 1/2, by assumption, to first split the 
integral into two parts and then use Jensen's inequality on both 
terms to obtain 

^/log(l + ^^" 

-1/2 

9eB 
T 



-co{0) dd 



\og(l + ^co{e)]d9 



9eB 



< 2vnN, log 1 



X log 1 + 



2vQa 



N,: 



co{e)d9\ +^(1-2j.oT) 



T 



0eB 



TFp 



(1 - 2iyoT)a 



co{9)de 



(67) 



eeB 



Let F{t, v) = Ch(t, v) |Ag(T, v)\\ Note that 

cn{e)de 



9£B 



(a) 



(b) 



1/2 



c((/3, 9)d9d(f 



where the last step follows from (9). If we now substitute (68) 
and (69) into (67), insert the result into (66), set Ae — ^i^qT, 
and use W = NxF, we get 



1/2 

i J logdct(^I+^C{9)^ d9 

-1/2 

<^logfl + ^) 
- TF ^\ aAuJ 

+ |^(l-AH)log[l+ f^^^ |. (70) 
TF \ ail-Au)l 



b) Bounds on r[0, 0] and on aj: To further lower-bound 
the RHS of (24), we next derive a lower bound on r[0, 0] and 
an upper bound on aj; the resulting bounds are explicit in Ae 
and e, and in the ambiguity function of g(t). 

Let T) ~ {{t,v) e [—To, Tg] X [— I'o, vq]} be the rectangular 
area in the delay-Doppler plane that supports at least 1 — e of 
the volume of Ch(t, v) according to (9). The following chain 
of inequalities holds: 

^[0,0] = I I Cn{T,v)\Ag{T,vf drdv 

1/ T 

> j j Cn{T,v)\Ag{T,vf drdv 

V 

> min l\Ag{T,u)f\ Cu{T,iy)dTdi' 



(r,u)ei 



-1/2 see 

1/2 



V 



> 






dOdif 



min {|A,(r,^)|n(l-e). 



(71) 



-1/2 eee 

1/2 



(c) 1 

< 

- TF 



1 



k— — oo n— — OO 



oo oo 



We now seek an upper bound on aj. Let 



E E CM('^,'^)d9d^ 



1/2 1/2 



k— — oo n— — oo 



oo oo 



-1/2-1/2 fc=-oo"=-°° 

Ch(t, v)dTdv = 1 



i^ ' r 



If — n 



oo oo 



'1 



fe=— OO n=— OO 
(*:,n)#(0,0) 



F ' T J 



d9dip 



(68) 



and note that 



OO oo 



where (a) follows from (65), (b) follows from (49), and (c) 
follows from Property 3 in Appendix D. Similar steps lead to 



co{9)d9 



eeB 



M{t,i^)< J2 E \Ag{T^kT,,.-nF)f 

k— — oo n— — oo 

oo oo 

= T. T. \{9{t + r)e-^'-'^\g,.r.{t)}\ 



/c— — oo n— — oo 
(a) 

< ll5WlP = i 



(72) 
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where (a) follows from Bessel's inequality [37, Tiim. 3.4-6]. The 
following chain of inequalities holds: 



Ch(t, v)M(t, v)dTdv 



To conclude, we note that for /3 = \/T/ F and under the 
assumption vqT = tqF, we get T ^ F = VtF, and tq = 
vq = VAh/2, which implies (26). 



Ch(t, v)M{t, v)dTdv + Ch(t, iy)M{T, v)dTdv 

< max \m(t,i^)\ / / Ch(t, !^)dTdt/ 
{T,u)evl J J J 

V 

+ max iMiT-v)} // Cn{T,v)dTdi' 
(T,,.)eR2\X) I ^ ' 'i JJ 

R2\D 



(a) 
< 



max (m(t, 1^)1 +- 
T,iy)eX' L J 



(r,^) 



(73) 



where (a) follows from (72), (7), and (9). 

The proof is completed by substituting (70) into (24), and 
using (71) and (73) in (24). 

Appendix G 
Proof of Lemma 4 

To prove the lemma, we verify that after the substitutions 

e(t) = v^g(/3i) 

r = T//3 

F ^ (3F 

To = To//3 

the lower bound L2 in (25) does not change. Note first that TF = 
TF and j/qT = ^qT. Furthermore, ||e(t)|| = |l5(t)|| = 1 and, 
by Property 4 in Appendix D, the orthonormality of {gk.n{t)} 
implies the orthonormality of {e{t~ kT)e^^'^"-^^}. Let now £ — 
[—To, To] X [— ?o, ^3]; we have that 

rUg = min |Ag(r, i/)| 



mm 

(r,;/)eX' 



= min \Ae{T,i^)\ . 



Similarly, we have 



00 00 



M„ = max V V \A„(T-kT,i^~nF)f 



T — hT 
Ae.l --—,P{v-nF) 



k — — QO n— — 00 
(fc,n)#(0,0) 

00 00 

max >^ >^ 

k— — OQ n— — co 
(k,n)^{0,a) 

00 00 

max > > 

(T.u)ev ^-^ ^-^ 

k=—oo n=— 00 
(fc,n)^(0,0) 

00 00 

max > > L4p ( T — fcT, i^ — nF 

fc— — 00 n— — 00 
{k,n)^{Ofi) 



A, I - - fcT, /3i/ - nF 
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